UNCLASSIFIED 

itcunmr  CLAisincAttoN  of  this  page 


T.  'HWWm  WtUSfcfifrtfKlw 

UNCLASSIFIED 


REPORT  DOCUMENTATION  PAGE 

"it,1  MMKwe; - 


2»  SECURITY  CLASSIFICATION  AUTHORITY 


2b  DECLASSIFICATION /DOWN  GRADING  SCHEDULE 


3.  DISTRIfiimON /AVAILABILITY  OF  REWT 

Approved  for  public  release;  distri¬ 
bution  unlimited. 


4  PERFORMING  ORGANIZATION  REPORT  NUM8ER(S( 


NPRDC  TR  85-10 


5  MONITORING  ORGANIZATION  REPORT  NUMBER<S} 


6«  NAME  Of  PERFORMING  ORGANIZATION 

Navy  Personnel  Research 
and  Development  Center 


6b  OFFICE  SYMBOL 

(*  aephcetdej 


7*  NAME  Of  MONITORING  ORGAMZATKJN 


41 


6c.  ADDRESS  (C*y.  State  and  ZIP  Code) 

San  Diego,  California  92152-6800 


7b  ADDRESS  (Cay.  State  end  OP  Code) 


Be  NAME  Of  FUNDING/SPONSORING  ORGANIZATION 


Office  of  Naval  Research 


8b  OFFICE  SYMBOL 

(H  applicable 

Code  200 


8.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 


Be  ADDRESS  (City.  State  end  ZIP  Code ) 

Box  8e 

Arlington,  Virginia  22217 


10  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM  ELEMENT  NO 

PROJECT  NO 

TASK  NO 

WORK  UNIT  NO 

PE62763N 

RF63-521 

806 

1 1  TITLE  (include  Security  Classification, 


A  NEW  HAZARD  FUNCTION  ESTIMATOR  OF  PERFORMANCE  TIME 


12  PERSONAL  AUTHOR) Si 

Bloxom,  Bruce 


1 3*  TYPE  OF  REPORT 

Technical 


13b  TIME  COVERED 

FROM  Oct  83 


Jul  84 


16.  SUPPLEMENTARY  NOTATION 


14  r»4TF  Of  Pfprwr  ryear,  Month.  Day) 

1984  November 


16.  PAGE  COUNT 

34 


17  COSAT1  CODES 

FIELD 

GROUP 

SUB-GROUP 

U5 

1  uy 

BA  iDCTOA/n  /P.*. 

1 8  SUBJECT  TERMS  (Continue  on  raveree  if  necessary  and  identify  by  bfock  number) 

human  performance,  performance  time,  hazard  function 


The  analysis  of  performance  time  is  used  to  study  a  wide  variety  of  manpower, 
training,  and  human  factors  problems  in  the  Navy.  The  cumulative  distribution  of 
performance  time  provides  some  indication  of  the  rate  of  performance,  but  it  can  lead  to 
false  conclusions  as  well.  One  method  of  measuring  the  rate  of  performance  with  a 
shifting  residual  basis  of  analysis,  called  hazard  analysis,  consists  of  estimating  a  hazard 
function,  which  shows  the  rate  of  performance  as  a  function  of  time.  Although  various 
procedures  have  been  developed  for  estimating  a  hazard  function,  none  of  the  procedures 
has  been  shown  to  produce  plausibly  smooth  and  precise  estimates  under  a  variety  of 
conditions.  This  research  proposed  a  constrained  quadratic  spline  as  an  estimator  of  the 
hazard  function  of  performance  time.  A  maximum  penalized  likelihood  procedure  was 
used  to  fit  the  estimator  to  a  sample  of  psychological  response  times.  The  procedure  was 
also  used  in  a  simulation  study  of  the  precision  of  the  estimator. 

21  ABSTRACT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


20.DtSTRIBLTnON /AVAILABILITY  OF  ABSTRACT 

Ek|  UNCLASSIFIED /UNLIMITED  QJ  SAME  AS  RPT 

22t  NAME  OF  RESPONSIBLE  INDIVIDUAL 


□  DTIC  USERS 


Bruce  Bloxom 


22b  TELEPHONE  (include  Area  Coda) 

( 6191  225-2231 


22c.  OFFICE  SYMBOL 

41 


DD  FORM  1473,  84  JAN 


83  APR  EDITION  MAY  BE  USED  UNTIL  EXHAUSTED 
ALL  OTHER  EDITIONS  ARE  OBSOLETE 


UNCLASSIFIED 

itCUUrfY  QASBriCATlON'y  Egngr 


LIBRARY 

RESEARCH  REPORTS  DIVISION 
NAVAL  POSTGRADUATE  SCHOOL 

MONTEREY,  CALIFORNIA  93943 


NPRDC  TR  85-10 


NOVEMBER  1984 


A  NEW  HAZARD  FUNCTION  ESTIMATOR  OF 
PERFORMANCE  TIME 


APPROVED  FOR  PUBLIC  RELEASE 
DISTRIBUTION  UNLIMITED 


NAVY  PERSONNEL  RESEARCH 
AND 

DEVELOPMENT  CENTER 
San  Diego,  California  92152 


NPRDC  TR  85-10 


November  1984 


A  NEW  HAZARD  FUNCTION  ESTIMATOR  OF  PERFORMANCE  TIME 


Bruce  Bloxom 


Reviewed  by 
Richard  C.  Sorenson 


Approved  by 
Robert  Penn 


Released  by 
0.  W.  Renard 
Captain,  U.S.  Navy 
Commanding  Officer 


Navy  Personnel  Research  and  Development  Center 
San  Diego,  California  92152-6800 


FOREWORD 


This  effort  was  conducted  within  exploratory  development  task  PE62763N  RF63-521- 
806  (Future  Technologies  for  Manpower  and  Personnel).  Part  of  this  task  is  devoted  to 
Techwatch,  a  systematic  effort  to  monitor  technological  developments  in  a  broad  range  of 
disciplines  across  both  western  and  Soviet  block  scientific  communities.  The  goal  of 
Techwatch  is  to  identify  technological  opportunities  of  potential  impact  on  manpower, 
personnel,  training,  and  human  factors.  Summaries  of  the  state  of  technology  will  be 
prepared  and  published,  from  time  to  time,  and  new  projects  will  be  proposed  to  exploit 
particularly  promising  technological  areas.  This  report  deals  with  hazard  analysis,  a 
statistical  approach  to  analyzing  performance  times.  Results  are  intended  for  researchers 
involved  in  studying  human  performance. 


3.  W.  RENARD 
Captain,  U.S.  Navy 
Commanding  Officer 


3.  W.  TWEEDDALE 
Technical  Director 
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SUMMARY 


Problem 

The  analysis  of  performance  time  is  used  to  study  a  wide  variety  of  manpower, 
training,  and  human  factors  problems  in  the  Navy.  For  example,  attrition  can  be 
measured  by  the  number  of  months  of  service  before  separation,  trainee  success  can  be 
measured  by  the  time  required  to  satisfactorily  complete  a  series  of  training  modules,  and 
the  vigilance  of  a  radar  observer  can  be  measured  by  the  length  of  time  he  or  she  can 
work  without  making  response  errors.  In  each  of  these  contexts,  the  cumulative 
distribution  of  performance  time  provides  some  indication  of  the  rate  of  performance,  but 
it  can  lead  to  false  conclusions  as  well.  In  the  attrition  example,  a  comparison  of  the 
percentages  of  high-school  graduates  and  nongraduates  at  various  points  in  their  first  tour 
of  duty  shows  a  smaller  percentage  of  nongraduates  at  each  point.  This  comparison  might 
lead  one  to  believe  that  the  attrition  from  nongraduates  is  greater  across  the  entire  term 
of  service.  However,  if  the  percentage  still  in  the  service  at  each  point  is  based  on  the 
number  remaining  just  previously,  a  very  different  result  can  be  obtained. 

One  method  of  measuring  the  rate  of  performance  with  a  shifting  residual  basis  of 
analysis,  called  hazard  analysis,  consists  of  estimating  a  hazard  function,  which  shows  the 
rate  of  performance  as  a  function  of  time.  Although  various  procedures  have  been 
developed  for  estimating  a  hazard  function,  none  of  the  procedures  has  been  shown  to 
produce  plausibly  smooth  and  precise  estimates  under  a  variety  of  conditions.  Precision 
has  been  especially  a  problem  in  the  upper  quantiles  of  the  distribution. 

Purpose 

The  purpose  of  this  research  was  to  (a)  develop  a  hazard  function  estimator  that  is 
visually  smooth  and  (b)  study  its  precision  with  various  sample  sizes  and  under  various 
shapes  of  the  hazard  function  being  estimated. 

Approach 

The  hazard  function  estimator  proposed  in  this  research  is  a  quadratic  spline,  which  is 
a  piecewise  quadratic  polynomial  constrained  to  be  continuous  and  to  have  continuous 
first-order  derivatives  where  the  pieces  are  joined,  points  that  are  commonly  termed 
knots.'  The  estimator  function  may  be  constrained  to  be  either  increasing  or  decreasing 
at  the  knots  and  either  convex  or  inflected.  A  procedure  for  fitting  the  spline  to  data  was 
proposed  and  its  use  illustrated  in  analysis  of  a  set  of  empirical  data.  To  examine  the 
precision  of  the  estimator  as  a  function  of  sample  size  and  variation  in  the  form  of  the 
hazard  function,  a  simulation  study  was  conducted. 

Results 

When  the  hazard  function  was  constrained  to  be  isotonic  or  bidirectional  and  to  have 
a  small  number  of  inflections,  the  empirical  example  demonstrated  that  a  maximum 
penalized  likelihood  procedure  can  provide  rapid  computation  of  this  kind  of  hazard 
estimator  and  can  produce  a  good  fit  to  the  data.  The  shape  of  the  hazard  estimate  in  the 
empirical  example  resembled  the  shape  of  hazard  estimates  frequently  obtained  under  two 
models  in  the  simulation  study  when  moderate  sample  sizes  (75  and  150)  were  used.  In  one 
of  those  models,  the  true  hazard  function  increased  and  then  leveled  off  at  the  75th 
percentile;  in  the  other  model,  the  true  hazard  function  increased  and  then  decreased  to 
about  three  fourths  of  its  maximum  before  leveling  off. 
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Conclusions 


The  simulation  study  indicated  that  more  substantial  decreases  (by  half  or  more)  in 
the  tail  of  the  true  hazard  function  could  be  detected  reliably  by  the  proposed  method 
when  moderate  sample  sizes  were  used.  The  method,  therefore,  has  practicality,  because 
such  decreases  have  been  found  in  empirical  studies  as  well  as  in  analyses  of  parametric 
distributions  (e.g.,  the  inverse  Gaussian  distribution).  The  study  also  indicated  that  the 
proposed  method  could  reliably  detect  smaller  decreases  in  the  true  hazard  function  and 
could  provide  rather  precise  pointwise  estimates  of  the  tail  of  that  function  when  a  large 
sample  size  (N  =  500)  was  used.  The  large  sample  size  also  resulted  in  much  more 
accurate  estimation  of  the  hazard  function  across  the  middle  range  of  the  distribution, 
but  bias  was  still  perceptible.  Finally,  the  study  indicated  that  transforming  the  data  to 
increase  the  dispersion  may  be  an  effective  means  of  improving  the  methods  precision 
with  moderate  sample  sizes.  Which  transformation  to  use  with  which  set  of  constraints  is 
a  subject  for  further  study. 
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INTRODUCTION 


Background  and  Problem 

The  analysis  of  performance  time  is  used  to  study  a  wide  variety  of  manpower, 
training,  and  human  factors  problems  in  the  Navy.  For  example,  attrition  can  be 
measured  by  the  number  of  months  of  service  before  separation,  the  success  of  a  trainee 
can  be  measured  by  the  time  required  to  complete  a  series  of  training  modules 
satisfactorily,  and  the  vigilance  of  a  radar  observer  can  be  measured  by  the  length  of  time 
he  or  she  can  work  without  making  response  errors. 

In  each  of  these  contexts,  the  cumulative  distribution  of  performance  time  provides 
some  indication  of  the  rate  of  performance,  but  it  can  lead  to  false  conclusions  as  well. 
In  the  attrition  example,  a  comparison  of  the  percentages  of  high-school  graduates  and 
nongraduates  at  various  points  in  their  first  tour  of  duty  shows  a  smaller  percentage  of 
nongraduates  at  each  point.  This  comparison  might  lead  one  to  believe  that  the  attrition 
from  nongraduates  is  greater  across  the  entire  term  of  service.  However,  if  the 
percentage  still  in  the  service  at  each  point  is  based  on  the  number  remaining  just 
previously,  a  very  different  result  can  be  obtained.  For  some  groups  of  enlistees  late  in 
their  term  of  service,  the  percentage  remaining  is  greater  for  the  nongraduates  than  for 
the  graduates.  The  less  educated  enlistees  at  this  point  may  be  making  more  of  an  effort 
to  remain,  perhaps  to  avoid  what  for  them  is  a  return  to  an  uncertain  civilian  job  market. 
This  finding  would  be  missed  by  an  inspection  only  of  the  cumulative  distribution  functions 
or  other  performance  time  summary  statistics. 

Measuring  the  rate  of  performance  with  a  shifting  residual  basis  of  analysis,  as  just 
described,  is  called  hazard  analysis.  The  analysis  consists  of  estimating  a  hazard  function, 
which  shows  the  rate  of  performance  as  a  function  of  time.  Although  various  procedures 
have  been  developed  for  estimating  a  hazard  function,  none  of  the  procedures  has  been 
shown  to  produce  plausibly  smooth  and  precise  estimates  under  a  variety  of  conditions. 
Precision,  particularly  variability,  has  been  especially  a  problem  in  the  upper  quantiles  of 
the  distribution. 

The  problem  addressed  by  this  research  may  be  stated  as  follows:  Assume  a  process 
that  is  initiated  at  time  x  =  0  and  is  completed  at  randomly  distributed  time  x  =  t.  Given 
N  independently  and  identically  distributed  observations,  Tj,  we  need  to  estimate  the  rate 

at  which  the  process  is  completed  at  each  point  on  x  >  0.  Specifically,  we  need  to 
estimate  the  conditional  density  function,  r(x)  =  f(x)/{l-F(x)},  where  f(x)  is  the  unknown 
probability  density  function  and  F(x)  is  the  unknown  cumulative  distribution  function. 
Because  of  its  use  in  reliability  theory  (e.g.,  Barlow  <5c  Proschan,  1975),  r(x)  is  commonly 
called  the  hazard  function. 

Motivation  for  estimating  a  distribution's  hazard  function  can  be  found  in  problems 
posed  in  psychology  as  well  as  in  biostatistics,  actuarial  science,  and  engineering 
reliability  (Lawless,  1983).  In  psychology,  examining  the  hazard  function  of  a  response 
time  distribution  has  been  proposed  as  an  alternative  to  examining  the  distribution's 
moments  when  one  is  trying  to  choose  among  alternative  parametric  models  of  decision 
processes  (e.g.,  Burbeck  &  Luce,  1982).  The  hazard  function  has  also  been  suggested  as 
part  of  a  nonparametric  measure  of  cognitive  capacity  (Townsend  &  Ashby,  1978).  In 
addition,  comparing  the  hazard  functions  of  two  response  time  distributions  has  been 
suggested  as  a  nonparametric  device  for  assessing  simply  whether  one  process's  response 
times  are  more  rapid  than  another's.  It  has  been  shown  that,  if  one  distribution's  hazard 
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function  dominates  another's  overall  x,  then  all  quantiles  of  the  first  distribution  are  less 
than  the  corresponding  quantiles  of  the  second  distribution  (Barlow  &  Proschan,  1975). 

Purpose 


The  purpose  of  this  research  was  to  (a)  develop  a  hazard  function  estimator  that  is 
visually  smooth,  and  (b)  study  its  precision  with  various  sample  sizes  and  under  various 
shapes  of  the  hazard  function  being  estimated. 


APPROACH 

The  hazard  function  estimator  proposed  in  this  research  is  a  quadratic  spline,  which  is 
a  piecewise  quadratic  polynomial  constrained  to  be  continuous  and  to  have  continuous 
first-order  derivatives  where  the  pieces  are  joined,  points  on  x  that  are  commonly  termed 
"knots."  Because  the  pieces  are  quadratic,  inflections,  such  as  departures  from  a  convex 
form,  can  occur  only  at  the  knots.  Also,  linear  constraints  on  the  coefficients  of  the 
polynomials  can  be  used  to  constrain  the  function  to  be  convex  or  inflected  and  either 
increasing  or  decreasing  at  the  knots  (Wright  &  Wegman,  1980). 

Two  features  of  a  quadratic  spline  suggest  the  ways  it  can  provide  reasonably  precise 
hazard  function  estimates.  The  first  feature  is  the  smoothness  that  can  be  produced  by 
linear  constraints  on  the  function.  This  smoothness  is  characteristic  of  hazard  functions 
of  many  commonly  considered  parametric  families  of  distributions:  Gamma,  generalized 
gamma,  Weibull,  inverse  Gaussian,  Rayleigh,  and  numerous  other  distributions  have  hazard 
functions  that  are  isotonic  (monotone  increasing  or  decreasing)  or  bidirectional  (increas¬ 
ing-then-decreasing  or  decreasing-then-increasing)  functions  of  x  and  that  have  a  small 
number  of  inflections  (Glaser,  1980;  Burbeck  &  Luce,  1982).  The  second  feature  that 
makes  a  quadratic  spline  a  promising  hazard  estimator  is  its  flexibility.  Second  and  higher 
order  derivatives  of  the  spline  are  not  defined  at  the  knots,  thus  allowing  the  function  to 
have  rather  pronounced  changes  in  curvature  at  those  points.  With  knots  no  closer  than  at 
every  decile,  a  quadratic  spline  can  rather  closely  approximate  even  the  very  peaked 
hazard  function  of  an  inverse  Gaussian  (Wald)  distribution.  Furthermore,  the  spline  can 
become  linear  in  the  upper  tail  of  the  distribution  as  is  necessary  when  the  random 
variable  has  an  exponential  component  (Ashby,  1982). 

Numerous  alternative  estimators  of  a  hazard  function  have  been  and  could  be 
developed  (e.g.,  Lawless,  1983;  Singpurwalla  3c  Wong,  1983).  A  higher  order  polynomial 
spline  could  be  employed  with  a  penalty  placed  on  the  spline's  roughness;  for  example,  on 
its  second  and  higher  order  derivatives  (e.g.,  Rice  <5c  Rosenblatt,  1983).  This  method 
would  clearly  be  appropriate  when  the  number  of  peaks  and  inflections  in  the  hazard 
function  is  not  known  or  is  reasonably  believed  to  be  small.  Alternatively,  a  spline  could 
be  employed  as  a  density  estimator  (e.g.,  Mendelsohn  &  Rice,  1982)  that  is  then  used  to 
compute  a  hazard  estimator;  however,  it  is  not  clear  how  that  hazard  estimator  would  be 
constrained  to  be  convex  or  isotonic  (Bloxom,  1983).  Finally,  there  have  been  proposals  of 
other  kinds  of  hazard  estimators  that  are  isotonic  or  bidirectional  (e.g.,  Bartoszynski, 
Brown,  McBride,  &  Thompson,  1981);  however,  those  estimators  are  not  constrained  to 
have  a  small  number  of  inflections. 

The  quadratic  spline  expression  proposed  in  this  research  as  a  hazard  function 
estimator  was  provided  with  constraints  that  are  sufficient  for  the  spline  to  be  isotonic  or 
bidirectional  and  convex  or  inflected.  A  procedure  for  fitting  the  spline  to  data  was 
proposed  and  its  use  illustrated  in  analysis  of  a  set  of  empirical  data.  To  examine  the 
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precision  of  the  estimator  as  a  function  of  sample  size  and  variation  in  the  form  of  the 
hazard  function,  a  small  simulation  study  was  conducted. 


RESULTS  AND  DISCUSSION 


Hazard  Estimators 


A  hazard  function,  as  defined  earlier,  completely  characterizes  a  distribution  in  the 
sense  that  the  probability  density  function  and  the  cumulative  distribution  function  can  be 
expressed  in  terms  of  the  hazard  function: 


f(x) 

=  r(x)exp{-/* 

r(u)  du} 

(1) 

F(x) 

=  1  -  exp{-/* 

r(u)  du}. 

(2) 

If  f(x)  and  F(x)  satisfy  their  definitions,  that  is,  if  f(x)^0  and  F(x)-*-  1  as  x  +  then  it  can 

be  shown  that  r(x)^0  for  all  x  and/*  r(u)  du->-®  as  x-*-®  (Barlow  &  Proschan,  1975).  In  this 

research,  a  completely  defined  hazard  function  was  defined  as  one  with  these  last  two 
properties.  Substitution  of  this  estimator  into  Equations  1  and  2  produces  estimators  of 

f(x)  and  F(x)  such  that  f(x)  ^0  and  F(x)  1  as  x-*-®. 

Quadratic  Spline 

A  formulation  of  a  quadratic  spline  that  can  provide  a  completely  defined  hazard 
estimator  is: 

r(x)  =  b-  S(x)  (3) 


2  2  2 

where  b  is  a  vector  of  K+3  coefficients,  S'  (x)  is  the  vector  { 1  ,  x  ,  x  ,  (x-a.)  ,  (x-a9)  z 

,  —  ,  (x-aj<.)+  },  0<aj<a2< — are  K  knots  joining  K+l  quadratic  pieces,  aQ  =  0  is  the 
lower  bound  of  x,  and  the  function  (x-a.)+  =  (x-a^)  if  x  >^a^  and  =  0  otherwise. 


This  formulation  has  a  number  of  useful  features.  First,  like  r(x),  the  function  is 
defined  over  the  interval  (0,®)  on  x.  Second,  because  of  the  one-to-one  correspondence 
between  coefficients  and  quadratic  terms,  r(x)  can  be  smoothed  to  be  quadratic  at  any 
knot  simply  by  setting  the  corresponding  coefficient  equal  to  zero.  Third,  obtaining 
estimates  of  f(x)  and  F(x)  requires  integrating  nothing  more  complicated  than  quadratic 


terms  (x-a.)+  .  Fourth,  obtaining  hazard  estimates  that  are  isotonic,  bidirectional,  or 

convex  requires  nothing  more  complicated  than  placing  linear  constraints  on  the  vector  of 
coefficients,  b,  during  the  estimation  of  that  vector. 


In  the  history  of  the  study  of  splines,  two  concerns  have  been  noted  regarding  the  use 
of  the  formulation  in  Equation  3.  One  concern  is  that  computation  of  Equation  3  is 
somewhat  slower  than  other  formulations,  most  notably  B-splines  (Shumaker,  1981), 
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because  the  number  of  terms  being  computed  increases  as  x  increases.  However,  in  spite 
of  this  relative  slowness,  estimating  r(x)  is  quite  rapid,  even  in  a  maximum  likelihood 
estimation  where  r(x)  is  computed  for  each  data  point  in  each  iteration.  A  second  concern 
is  that  b  becomes  unstable  if  one  knot  is  placed  very  close  to  another  knot  because  the 
corresponding  terms  of  S(x)  become  collinear.  However,  when  one  is  estimating  a  hazard 
function,  knots  do  not  generally  need  to  be  very  close  to  give  the  spline  sufficient 
flexibility  to  approximate  a  wide  variety  of  hazard  functions.  Furthermore,  even  if  a  pair 
of  knots  should  need  to  be  placed  in  close  proximity,  instability  of  b  does  not  imply 
instability  of  r(x),  which  is  the  estimate  of  interest. 

An  argument  in  favor  of  using  B-splines  instead  of  Equation  3  is  that  a  wide  variety 
of  shapes  of  functions  can  be  approximated  by  B-splines  that  (1)  have  adjacent  knots 
equated,  (2)  have  been  integrated,  or  (3)  have  their  coefficients  constrained  to  be 
nonnegative.  However,  the  formulation  in  Equation  3,  as  developed  in  this  study,  is 
sufficient  for  approximating  hazard  functions  that  are  plausible  in  a  wide  variety  of 
applications.  Furthermore,  using  B-splines  with  nonnegative  coefficients  to  approximate 
some  plausible  hazard  functions,  such  as  those  that  are  bidirectional  with  a  nonzero 
asymptote,  requires  a  mixture  of  B-splines  and  integrated  B-splines;  that  mixture  may 
require  estimating  more  coefficients  than  are  needed  when  Equation  3  is  used. 

Constraints 


If  the  quadratic  spline  in  Equation  3  is  to  be  a  completely  defined  hazard  estimator, 
linear  constraints  need  to  be  placed  on  b.  The  approach  proposed  here  places  a  set  of 
constraints  on  the  spline,  r(a.),  on  its  first  derivative,  r*  (a.),  and  on  the  first  backward 

difference  of  its  derivative,  Ar'  (a^)  =  {r'  (a.)  -  r'(a^  j)},  at  the  knots,  a^.  Each  constraint 

can  be  written  as  a  linear  function  of  the  spline's  coefficients;  from  Equation  3,  these  are: 

r(aj)  =  b'  S(a.), 

r'  (a.)  =  b'  S*(aj), 

and 


A  r'  (a.)  =  b'  (  S*(aj)  -  S*(a.  j)}, 

where  each  element  of  S*(x)  is  the  first  derivative  of  the  corresponding  element  of  S(x). 
Because  the  pieces^  of  r(x)  ^are  quadratic  between  knots,  the  constraints  also  place 
limitations  on  r(x)  ,  r'(x),  and  r"(x)  between  knots. 

Table  1  lists  classes  of  constraints  that  form  sufficient  conditions  for  r(x)  to  be 
isotonic  increasing,  isotonic  decreasing,  bidirectional  increasing-then-decreasing,  or  bi¬ 
directional  decreasing-then-increasing.  These  four  functional  shapes  are  of  interest  here 
because  they  characterize  the  hazard  functions  of  a  wide  variety  of  distributions  of  x  (see 
Burbeck  &  Luce,  1982;  or  Glaser,  1980,  for  specific  examples).  For  isotonic  r(x),  Table  1 
also  lists  subclasses  of  constraints  sufficient  for  r(x)  to  be  positive  convex  (i.e.,  to  have 
nonnegative  second  derivatives),  to  be  negative  convex  (i.e.,  to  have  nonpositive  second 
derivatives),  or  to  have  a  single  inflection  (i.e.,  to  have  exactly  one  positive  convex 
segment  and  one  negative  convex  segment).  For  bidirectional  r(x),  Table  1  also  lists 
subclasses  of  constraints  sufficient  for  r(x)  to  be  convex,  have  one  inflection,  or  have  two 
inflections. 
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Table  1 


Constraints  on  Hazard  Estimator  by  Class  of  Function 


Class 


Isotonic 

increasing 


Subclass  Constraints 

r(aQ)  =  0 

r’Uj)  >  0  ,  j  1  K 

A  f'(aK+1)  =  0 

Positive  convex  r(ag)  =  ® 

r'  (aQ)  >  0 

A  r '  (a  . )  >  0  ,  j  <  K 

J 

A  ?'(aK+1)  •  0  ,  any  a^  >  aK 

Negative  convex  r(ag)  =  ® 

Ar'(a.)  <  0  ,  j  <  K 

r '  (aK)  >  0 

A  r'(aK+1)  =  0  ,  any  aK+1  >  aK 

One  inflection  rUg)  =  ® 

r'  (aQ)  >  0 

A  r '  (a^)  >_  0  ,  j  <  i 
A  r ' (a .)  <  0  ,  i+1  <  j  <  K 

J 

r' (aK)  >  0 

A  f'(aK+,)  =  0  ,  any  aK+]  >  aR 
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Table  1  (Continued) 


Class  Subclass  Constraints 

r'(a.)  <  0  ,  j  <  K-l 
r'(aK)  =  0 

A  r'(aK+1)  =  0  ,  any  aK+]  >  aK 
r(aK)  >  0 

Positive  convex  A  r'(a.)  _>  0  ,  j 

J 

r'(aK)  =  0 
4  f'(aK+1)  ■  0  , 

r(aK)  >  0 

One  inflection 

A  r '  (a^.)  <  0  ,  j  _<  i 
A  ? 1  ( a  . )  _>  0  ,  i  +1  _<  j  £  K 

J 

r'(aK)  =  0 

A  r' (aK+1 )  =  0  ,  any  aK+1  >  a 
f(aK)  >  0 


<  K 


any  aK+1  >  a 


Isotonic 

decreasing 
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Table  1  (Continued) 


Class 


Subclass 


Constraints 


Bidirectional 

increasing-then- 

decreasing 


r(aQ)  =  0 
r'(a.)  >  0  , 
r'Uj)  <  0  , 

r'(aK)  =  0 
i  r'(aK+1)  -  0 
r(aK)  >  0 


j  i  1 

i+1  <  j  <  K  +  1 

*  any  aK+1  >  aK 


One  inflection 


r(aQ)  =  0 

A  ?' (a^)  <  0  ,  j  <  1 
A  r 1 (a  )  >  0  ,  i+1  <  j  <  K 

j  "  " 

r'(aK)  =  0 

A  r'(aK+1)  =  0  ,  any  aK+]  >  a 
r(aK)  >  0 


Two  inflections 


r(aQ)  =  0 
r'(a0)  >  0 
Ar'(a.)  >  0  , 
A  r*  (a j )  1  0  , 
A  r '  (aj )  ^  0  , 
r'(aK)  =  0 

A  (aK+1 )  =  0 
r(aK)  >  0 


j  1  1 

i+1  j  m 
m+1  1  j  K 

.  an*  aK+l  >  aK 
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Table  1  (Continued) 


Class  Subclass  Constraints 

r1  (a .)  £  0  ,  j  £’i-l 


Bidirectional 

decreasing-then- 

increasing 


Positive  convex 


One  inflection 


r'(a1)  =  0 
r(a.)  >0 

?'  (a.)  _>  0  ,  1+1  £  j  £  K 
A  r1  (aK+1 )  =  0  ,  any  aR+1  >  aK 

A  r ' (a .)  >  0  ,  j  <  K 

J 

r(ai)  >  0  ,  0  <  i  <  K 

f‘,(ai)  =  0 

A  ?'(aK+1)  =  0  »  any  aK+l  >  aK 
A  r '  (a .)  >  0  ,  j  <  m 

A 

r(a.. )  £  0  ,  0  <  i  <  m 
r'(a.)  =  0 

A  ?'  (a .)  <  ,  m+1  £  j  <  K 

J 

r' (aK)  >  0 

A  r'(aK+1)  ■  0  ,  any  »K+, ’>  aK 
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Each  set  of  constraints  in  Table  1  is  sufficient  for  a  completely  defined  r(x).  If  r(x)  is 
isotonic  and  increasing,  the  sufficient  conditions  require  that  r(x)  be  nonnegative  at  the 
lower  bound,  a^,  and  have  a  nonnegative  slope  at  the  lower  bound  and  at  all  knots,  a^  >  ag. 


Because  the  function  is  quadratic  between  knots,  these  constraints  are  sufficient  for  its 
slope  to  be  nonnegative  between  knots  as  well  as  at  them.  If  r(x)  is  isotonic  and 
decreasing,  the  sufficient  conditions  for  a  completely  defined  estimator  require  that  r(x) 
be  greater  than  zero  at  and  have  a  nonpositive  slope  at  the  lower  bound  and  at  all 


knots,  a.  >  ag.  These  constraints  make  the  slope  of  r(x)  nonpositive  between  knots  as  well 

as  at  them.  The  sufficient  conditions  for  a  completely  defined  r(x)  also  require  that  r(x) 
be  linear  with  a  nonnegative  slope,  for  increasing  r(x),  or  with  a  zero  slope,  for  decreasing 
r(x),  where  x  >  a?<.,  that  is,  in  the  upper  tail  of  the  distribution. 


If  r(x)  is  bidirectional  and  increasing-then-decreasing,  the  sufficient  conditions  in 
Table  1  for  a  completely  defined  r(x)  require  that  r(x)  be  nonnegative  at  the  lower  bound, 
have  a  nonnegative  slope  at  the  lower  bound  and  at  all  knots  a.  ,  j  =  1, — ,  i  for  some  i  < 

(K-2),  have  a  nonpositive  slope  at  all  knots  a.  ,  j  =  i+1, — ,K-1,  have  a  flat  slope  and  be 

1  A  . 

positive  at  a^,  and  be  linear  with  a  zero  slope  where  x  >  a^.  If  r(x)  is  bidirectional  and 

A 

decreasing-then-increasing,  the  conditions  in  Table  1  require  that  r(x)  have  a  nonpositive 


slope  at  the  lower  bound  and  at  all  knots  a^  ,  j 

with  a  flat  slope  at  a.,  have  a  nonnegative  slope  at  all  knots  a.  ,  j  =  i+1, — ,K,  and  be  linear 
where  x  >  a^.  1  1 


1, — ,i-l  for  some  i  <  K,  be  nonnegative 

i. 

1 


A 

Besides  being  sufficient  conditions  for  a  completely  defined  r(x),  subclasses  of 
constraints  in  Table  1  limit  the  number  of  inflections  in  r(x)  by  placing  restrictions  on  the 
first  backward  difference  in  the  slope  at  the  knots.  For  example,  one  set  of  sufficient 
conditions  for  an  r(x)  with  no  inflections  is  the  subclass  of  constraints  for  an  increasing 
positive  convex  function.  This  subclass  constrains  all  of  the  quadratic  segments  between 
knots  to  be  positive  convex  by  requiring  the  first  backward  difference  in  slope  at  all  knots 
aj  ,  j  =  1, — ,K,  to  be  nonnegative.  Another  set  of  sufficient  conditions  for  an  r(x)  with  no 


inflections  is  the  subclass  of  constraints  for  an  increasing  negative  convex  function;  this 
subclass  requires  the  first  backward  difference  in  slope  to  be  nonpositive  at  all  knots,  a.  , 

j  =  1, — ,K,  and,  thus,  constrains  the  quadratic  segments  between  knots  to  be  negative 


convex. 


Sufficient  conditions  for  r(x)  to  have  one  inflection  are  subclasses  of  constraints  in 
which  the  first  backward  difference  in  slope  is  nonnegative  (or  nonpositive)  for  all  knots, 

a.  ,  j  =  1, — ,i,  and  is  nonpositive  (or  nonnegative)  for  all  knots  a.  ,  j  =  i+1, — ,K.  These 
'  A  .  J 

constraints  form  an  r(x)  that  is  positive  convex  below  a.  and  negative  convex  above  a^. 

Similarly,  sufficient  conditions  for  r(x)  to  have  two  inflections  are  subclasses  of 
constraints  that  make  the  function  positive  (or  negative)  convex  up  to  some  knot,  a., 

negative  (or  positive)  convex  between  knots  a.  and  a  ,  i  <  m  <  K,  and  positive  (or 
negative  convex  above  am.  1  m 
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Parameter  Estimation 


Two  problems  were  encountered  in  an  application  of  the  hazard  estimator:  (a) 
selecting  the  location  of  the  knots,  a.,  and  (b)  selecting  a  function  to  optimize  when 
estimating  the  coefficients,  b.  ' 

Knots 


One  consideration  in  selecting  the  location  of  the  knots  is  that  they  be  placed  so  that 
the  true  r(x)  can  be  closely  approximated  by  r(x).  In  practice,  an  optimal  approach  to  this 
problem  (e.g.,  de  Boor,  1978)  is  difficult  because  the  true  r(x)  is  unknown.  However,  one 
can  distribute  a  moderate  number  of  knots,  either  at  equal  intervals  (Shumaker,  1981)  or 
at  equally  spaced  quantiles  on  x  (Wahba,  1976),  so  that  knots  are  likely  to  be  placed  in 
regions  where  r(x)  departs  markedly  from  a  quadratic  polynomial  form. 

A  second  consideration  in  selecting  the  location  of  the  knots  is  that  there  be  enough 
data  between  knots  to  avoid  severe  sampling  error  in  the  pieces  of  the  spline.  Based  on 
practical  experience  in  fitting  splines  to  data,  Wold  (1974)  suggested  that  there  be  at  least 
four  or  five  data  points  between  each  pair  of  knots.  Lii  and  Rosenblatt  (1975)  illustrated 
how  extremely  unstable  a  spline  (density)  estimator  can  become  in  segments  where  the 
data  are  sparse. 

In  the  empirical  example  and  in  the  simulations  that  will  be  described  later,  a  knot  is 
placed  at  each  of  the  nine  deciles  of  the  sample's  distribution.  This  placement  distributes 
the  knots  widely  on  the  distribution  of  x  as  is  necessary  for  close  approximation  of  a 
variety  of  forms  of  r(x).  Also,  it  is  an  attempt  to  avoid  severe  sampling  error  across  r(x): 
When  the  sample  size  is  at  least  50,  Wold's  recommended  minimum  number  of  data  points 
is  between  each  pair  of  knots.  Further  control  of  sampling  error  is  sought  by  constraining 
r(x)  to  be  linear  above  the  ninth  decile,  a^;  this  constraint  is  included  in  each  set  of 
constraints  in  Table  1. 

Coefficients 

A  natural  function  to  optimize  when  estimating  the  spline's  coefficients,  b,  is  the 
likelihood  function 

n  f(T.) 

i 

or,  more  practically,  the  log-likelihood  function 

E  In  f(T.). 

i 

One  attractive  feature  of  such  an  approach  is  the  close  connection,  through  substitution  in 
Equation  1,  between  the  estimator,  r(x),  and  the  function  being  optimized.  Another 
attractive  feature  is  that,  if  each  constraint  on  r(x)  is  incorporated  into  a  penalty 
function,  Tp(b),  is  added  to  the  log  likelihood.  For  example,  in 

g  =  E  In  f(T  )  +  E  T  (b),  (4) 

i  p  p 
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then  optimizing  g  has  a  Bayesian  interpretation.  This  interpretation  assumes  that 


exp{  Z  Y  (b)} 

P  P 

is  proportional  to  the  prior  joint  distribution  of  b.  Then  the  "penalized"  likelihood, 

G  =\T[  f(T.)}  .  exp  |  Z  Y  (b)} 
i  p  p 


is  proportional  to  the  posterior  joint  distribution  of  b.  From  this  perspective,  maximizing 
g  results  in  the  same  b  as  maximizing  the  posterior  likelihood. 

Using  Equation  4  requires  the  selection  of  penalty  functions,  Yp(b),  but  finding 

practical  functions  does  not  seem  to  be  difficult.  Fiacco  and  McCormick  (1968}  and 
Mylander,  Holmes,  and  McCormick  (1973)  developed  an  algorithm  that  uses  Yp(b)  =  - 

$(b)2/u  where  $(£)  is  an  equality  constraint;  for  example,  $>(b)  =  r'  (a^,)  =  0,  and  Yp(b)  =  u 

In  0  (b)  where  0(b)  is  a  nonnegativity  constraint  (e.g.,  0(b)  =  r'  (a_)^>  0).  The  constant  u  is 

-16  ^ 

chosen  to  be  small  (e.g.,  2  ),  so  that  its  effect  on  Equation  4  is  negligible  when  the 
constraints  are  satisfied).  Using  these  penalty  functions  results  in  estimates  of  b  that 
satisfy  the  constraints  and  yield  close  fits  to  the  data,  as  described  in  the  numerical 
example  (see  Empirical  Example). 


With  the  penalty  function  approach  shown  in  Equation  4  and  linear  constraints  of  the 
kind  given  in  Table  1,  maximizing  Equation  4  can  be  accomplished  by  any  one  of  a  number 
of  optimization  algorithms.  The  algorithm  by  Mylander  et  al.  (1973)  was  used  for  the 
numerical  example  and  the  simualtion  study  that  follows;  it  employs  the  Gauss-Newton 
method  of  optimization  that  requires  first-  and  second-order  partial  derivatives  of  the 
log-likelihood;  from  Equations  1  and  3  these  are 


T5  {Z  f(T.)}  =  Z  '  b'  S(Ti)}"1_S(Ti)  -  Z  V(T.) 
i  i  i 

and 

^  {  Z  f(T  )}  =  -  Z  b-  S(Tj)}-2  S(Tj)  S'Orp, 

—  i  i 

respectively,  where  v(T.)  is  a  vector  of  elements  of  S(T.)  that  have  been  integrated  over 
|  0,  Tj}.  With  double  precision  of  an  IBM  3033,  this  algorithm  converges  on  an  optimum  in 
about  10  seconds  with  N  =  75  and  in  about  35-40  seconds  with  N  =  500. 

Choosing  Among  Alternative  b 

In  most  applications,  using  more  than  one  class  or  subclass  of  constraints  of  the  kind 
shown  in  Table  1  is  plausible  a  priori.  It  then  becomes  important  to  decide  empirically 
which  constraints  are  most  plausible.  For  example,  in  Burbeck  and  Luce  (1982),  it  was 
important  to  decide  empirically  whether  a  distribution's  hazard  function  was  isotonic 
increasing  or  peaked;  that  is,  bidirectional  increasing-then-decreasing. 
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When  alternative  sets  of  constraints  are  plausible,  g  in  Equation  4  can  be  optimized 
for  each  of  the  sets,  resulting  in  alternative  estimates  of  b.  A^natural  extension  of  this 
approach  is  to  select  as  the  most  plausible  estimate  that  b  for  which  g  is  largest. 
Although  this  approach  is  used  in  the  following  numerical  example,  the  approach  is 
problematic  because  the  dimensionality  of  b  is  not  well  defined  and  can  vary  across  sets 
of  constraints.  When  some  of  the  inequality  constraints  placed  on  b  are  on  their 
boundaries,  it  is  unclear  what  the  effecive  number  of  parameters  is.  In  such  a  case,  it  is 
unclear  how  to  adjust  comparisons  of  g  for  variations  in  the  dimensionality  of  b.  Some 
alternatives  were  considered  in  the  simulation  study. 

Empirical  Example 


To  illustrate  the  procedure  just  described,  response  times  collected  by  Kohfeld, 
Santee,  and  Wallace  (1981)  were  used  to  obtain  r(x).  The  data  were  response  times 
collected  from  a  single  subject  in  a  single  session  with  a  Donders  multistage  decision  task 
on  which  the  subject  was  highly  practiced.  Knots  aj, — ,a^  were  placed  at  the  nine  sample 

deciles,  .226,  236,  .239,—,  .295  seconds,  respectively.  The  lower  bound,  an,  was  set  equal 
to  0. 


In  analyzing  these  data,  one  question  is  whether  the  hazard  function  is  an  increasing 
or  a  peaked  function  of  x;  that  is,  whether  the  likelihood  of  a  response  increases  or 
increases-then-decreases  as  the  subject  goes  longer  without  responding  (Burbeck  &  Luce, 
1982).  Under  the  constraints  for  an  isotonic  increasing  r(x)  in  Table  1,  optimizing 
Equation  4  yielded  g  =  212.40.  Under  the  constraints  for  a  bidirectional  increasing-then- 
decreasing  r(x),  the  estimate  was  constrained  to  be  nondecreasing  from  knots  a^  to  a., 

nonincreasing  from  knots  a^+j  to  a^,  and  flat  at  and  above  a^,  the  ninth  decile;  optimizing 

Equation  4  with  i  =  2, — ,7  yielded  values  of  g  =  211.23,  211.23,  211.30,  211.27,  211.18,  and 
211.44  respectively.  Because  all  of  these  latter  values  of  g  were  less  than  the  g  obtained 
with  the  constraints  for  an  isotonic  increasing  r(x),  the  increasing  hazard  function  was 
accepted  as  more  plausible  than  the  increasing-then-decreasing  hazard  function.  Figure  1 
shows  the  isotonic  increasing  hazard  estimate  and,  from  substitution  in  Equation  1,  its 
density  and  cumulative  distribution  function  estimates  (plotted  with  the  sample  cumula¬ 
tive  step  function).  As  can  be  seen  from  the  plot  of  the  cumulative  distribution  function, 
the  fit  to  the  data  is  quite  close. 

Given  that  the  hazard  function  appears  to  be  isotonic  increasing,  we  ask  where  an 
inflection,  if  any,  occurs  in  the  function.  The  two  inflections  in  r(x)  in  Figure  1  may  be 
viewed  with  some  scepticism  because  most  commonly  used  parametric  distributions  with 
increasing  hazard  functions  have  no  more  than  one  inflection  in  those  functions. 

Three  subclasses  of  constraints  in  Table  1  limit  an  isotonic  increasing  hazard 
estimate  to  no  more  than  one  inflection.  Using  the  constraints  for  the  positive  convex  or 
the  negative  convex  function  results  in  a  hazard  estimate  with  no  inflection.  The 
constraints  for  the  one-inflection  functions  specify  a  knot,  a.,  below  which  r(x)  is  positive 

convex  and  above  which  r(x)  is  negative  convex.  To  select  the  most  plausible  estimate 
here,  Equation  4  was  optimized  under  the  positive  convex  set  of  constraints,  the  negative 
convex  set  of  constraints,  and  eight  (i  =  1, — ,8)  sets  of  positive-then-negative  convex 
constraints.  The  largest  value  of  g  was  211.55,  which  was  obtained  under  the  positive- 
then-negative  convex  constraints  with  i  =  2.  Figure  2  shows  this  hazard  estimate  and  its 
density  and  cumulative  distribution  function  estimates  (plotted  with  the  sample  cumula¬ 
tive  step  function).  It  is  interesting  to  note  that,  even  though  the  upper  tail  of  r(x)  in 
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Sample  cumulative  distribution  function  for  100  response  times  of  a  subject  in  a 
multistage  decision  task  (Kohfeld  et  al.,  1981). 


Figure  1.  Estimates  oMsotonic  hazard  (r  (x)),  density  (f  (x)),  and 
cumulative  (F  (x))  distribution. 
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Sample  cumulative  distribution  function  for  100  response  times  of  a  subject  in  a 
multistage  decision  task  (Kohfeld  et  al.,  1981). 


Figure  2.  Estimates  of  one-inflection  hazard  (r'  (x)),  density  (f  (x)), 
and  cumulative  (F  (x))  distribution  functions. 
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Figure  2  is  much  less  steep  than  the  upper  tail  of  f(\)  in  Figure  1  and  has  no  inflection,  the 
fit  of  F(x)  to  the  data  in  the  cumulative  step  function  is  nearly  as  close  as  in  Figure  1. 
Thus,  if  it  is  assumed  that  the  true  hazard  function  has  only  one  inflection,  the  inflection 
in  the  upper  tail  of  r(x)  in  Figure  1  may  be  attributable  to  error  of  estimation.  This 
conclusion  makes  sense  when  one  examines  r(x)  as  a  function  of  f(x)  and  F(x)  in  r(x)  = 
f(x)/{  l-F(x)},  where  it  is  seen  that  even  a  small  amount  of  error  in  F(x)  can  produce  a 
large  error  in  r(x)  as  F(x)-*-  1. 

Simulation  Study 

As  shown  in  the  preceding  numerical  example,  the  proposed  method  of  hazard 
estimation  can  provide  a  close  fit  to  a  set  of  empirical  data.  However,  the  example  does 
not  indicate  how  precisely  the  method  recovers  the  true  hazard  function.  The  simulation 
study  was  designed  to  provide  a  partial  assessment  of  the  precision  of  the  method.  It 
examines  how  accurately  a  variety  of  increasing  and  increasing-then-decreasing  hazard 
functions  are  estimated  with  samples  of  the  size  often  employed  in  psychological 
experiments. 

Method 


Four  hazard  functions  (see  Figure  3)  define  the  distributions  sampled  in  this  study. 
The  first  distribution,  Model  A,  has  the  hazard  function 

r .  (x)  =  17.9  x  -  22.4  x2  -  17.5  (x  -  .4)  2  +  79.8  (x  -  .55)  2  -  39.9  (x  -  .7)  2. 

A  +  +  + 

Model 


Figure  3.  Hazard  functions  for  four  simulation  models.  Each 
function  is  shown  up  to  the  95th  percentile  of  the  model's 
response  time  distribution. 
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This  function  has  important  features  in  common  with  some  large-sample  estimates  of 
hazard  functions  obtained  in  Burbeck  and  Luce  (1982):  The  function  increases  up  to  the 
sixth  decile  and  then  decreases  to  the  eighth  decile,  where  it  approaches  an  asymptote 
that  is  half  the  maximum  of  the  function.  The  second  distribution,  Model  B,  has  the 
hazard  function 

r«(x)  =  12.5  x  -  12.5  x,  -  12.5  (x  -  .5)  2  +  50.0  (x  -  .625)  2  -  25.0  (x  -  .75)  2. 

This  function  resembles  r^(x)  with  the  exception  of  its  asymptote,  which  is  three  fourths 
the  maximum  of  the  function.  The  third  distribution,  Model  C,  has  the  hazard  function 

r^(x)  =  8.9  x  -  6.8  x2  +  6.8  (x  -  .66)+2. 

This  function  increases  to  an  asymptote  at  the  distribution's  75th  percentile.  Its  negative 
convex  form  and  flat  upper  tail  mimic  the  hazard  functions  of  some  parameterizations  of 
the  gamma  and  Weibull  distributions.  The  fourth  distribution,  Model  D,  has  the  hazard 
function 


rD(x)  =  4.6  x. 

This  function  is  the  linear  hazard  function  of  the  Rayleigh  distribution,  which  has  enjoyed 
some  popularity  in  reliability  engineering  studies.  It  is  included  here  to  provide  a  model  in 
which  r(x)  is  increasing  in  the  upper  tail  of  the  distribution. 

Two  sample  sizes,  75  and  150,  were  used  in  this  study.  Although  sample  sizes  larger 
than  these  are  sometimes  available  in  psychological  experiments,  these  represent  the 
approximate  numbers  of  observations  often  obtained  in  a  single  experimental  condition. 

For  each  of  the  four  models  and  each  of  the  two  sample  sizes,  20  samples  were  drawn 
using  random  seeds  and  a  uniform  random  number  generator  LLRANDOMII  (Lewis  & 

T- 

Uribe,  1981).  Sample  values,  T-,  were  obtained  by  inverting  F(Tj)  =  1  -  exp  {/q1  r(u)  du}. 

Twenty  replications  do  not  provide  a  very  precise  estimate  of  the  distribution  of  r(x)  at 
each  x.  However,  they  can  provide  results  that  indicate  regions  of  x  where  bias  or 
sampling  variation  in  r(x)  is  quite  large. 

The  procedure  used  to  estimate  r(x)  in  each  replication  was  identical  to  the  procedure 
used  in  the  first  phase  of  the  numerical  example.  Knots  were  placed  at  the  nine  sample 
deciles.  The  function  g  in  Equation  4  was  maximized  under  the  constraints  for  an  isotonic 
increasing  r(x),  using  the  notation  gj  for  this  maximum  of  Equation  4.  The  function  g  in 

Equation  4  was  also  maximized  under  the  constraints  for  an  increasing-then-decreasing 
r(x)  with  a.  at  knots  7,  6,  5, — .  Whichever  of  these  placements  of  aj  produced  the  largest 

value  of  Equation  4,  g^,  provided  the  r(x)  that  was  accepted  as  the  most  plausible 

increasing-then-decreasing  estimate.  The  choice  between  the  increasing  r(x)  and  the 
increasing-then-decreasing  r(x)  was  based  on  the  contrast,  c  =  gj  -  g^.  As  in  the 

numerical  example,  the  increasing  r(x)  was  chosen  if  c^0;  the  increasing-then-decreasing 
r(x)  was  chosen  otherwise.  Two  other  criteria  for  c  were  tried  and  were  found  to  select 
the  correct  set  of  constraints  less  accurately. 
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Three  methods  were  used  to  assess  the  precision  of  the  hazard  estimates  for  each 
model  and  sample  size.  One  method  was  to  examine  the  proportion  of  replications  in 
which  r(x)  was  correctly  chosen  as  increasing  (IHF)  or  increasing-then-decreasing  (IDHF). 
The  true  hazard  functions  in  Models  A  and  B  are  IDHF;  in  Models  C  and  D,  they  are  IHF. 
The  second  method  was  to  examine  the  integrated  square  discrepancy  between  r(x)  and 
r(x)  over  a  finite  range  on  x.  When  averaged  across  replications,  this  measure  is  the 
sample  analogue  of  the  mean  integrated  square  error  (MISE)  sometimes  used  in  studies  of 
asymptotic  properties  of  function  estimators  (e.g.,  Kronmal  &  Tarter,  1968;  Hallu  1982). 
The  third  method  of  assessing  the  precision  of  r(x)  was  to  plot^the  mean  of  the  20  r(x)'s  as 
a  function  of  x.  Also,  a  high-  and  low-order  statistic  of  the  r(x)'s  was  plotted  at  each  of 
10  points  on  x.  In  contrast  to  the  first  two  methods,  this  method  was  intended  to  indicate 
approximately  how  bias  and  sampling  variation  in  r(x)  change  across  the  range  of  x. 

Results 


The  first  section  of  Table  2  shows  the  proportion  of  replications  in  which  r(x)  was 
correctly  classified  as  IHF  or  IDHF  when  c  =  gj  -  gj^  was  compared  with  zero.  For 

Models  A  and  B,  this  is  the  proportion  of  replications  in  which  c  <  0;  for  Models  C  and  D, 
this  is  the  proportion  of  replications  in  which  c>_0x  The  proportion  was  quite  high  for  all 
models  except  Model  B,  where  almost  half  of  the  r(x)  were  incorrectly  chosen  to  be  IHF. 
It  is  not  surprising  that  the  error  rate  for  Model  B  is  greater  than  for  Model  A,  because 
r^(x)  decreases  much  less  than  r^(x)  and  thus  more  resembles  an  IHF  function.  It  is 

somewhat  surprising  that  the  error  rate  for  Model  C  is  so  low,  considering  that  r^.(x)  is 

level  above  the  75th  percentile.  If  r(x)  were  unbiased,  one  would  expect  that  sampling 
error  would  result  in  some  of  the  r^fx)  being  IDHF.  However,  other  results,  to  be 

reported  below,  suggest  that  r(x)  has  a  bias  that  tends  to  make  it  IHF. 


Table  2 

Results  of  the  Use  of  Three  Log-Likelihood  Criteria:  Proportions  of 
Replications  That  Correctly  Classify  the  Hazard  Estimate 


Criterion  for 

C  =  gI  "  gID 

Sample 

Size 

Proportions  by  Simulation  Model 

Increasing-Then- 

Decreasing 

Increasing 

Model  A 

Model  B 

Model  C 

Model  D 

Zero 

75 

1.00 

.60 

.95 

1.00 

150 

1.00 

.55 

1.00 

1.00 

Akaike 

75 

«  _ 

1.00 

.15 

(1974) 

150 

.95 

1.00 

-- 

Schwartz 

75 

_  _ . 

1.00 

.00 

(1978) 

150 

-- 

1.00 

.10 

-- 

Note.  N  =  20  replications  for  each  model  under  each  sample  size. 
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Table  2  also  shows  the  proportion  of  replications  in  which  r(x)  was  correctly  classified 
when  c  was  compared  with  two  criteria  other  than  zero  (Akaike,  1974;  Schwartz,  1978). 
Both  criteria  take  into  account  that  gj  and  are  optimized  under  different  effective 

numbers  of  parameters.  When  gj  is  optimized,  the  slope  of  r(x)  is  not  constrained  to  equal 

zero  in  the  upper  tail.  Thus,  obtaining  gj  requires  estimating  one  more  parameter  than 

does  obtaining  g^,  under  which  r(x)  is  constrained  to  have  a  zero  slope  in  the  upper  tail. 

Because  of  this  difference  between  gj  and  gj^,  a  criterion  proposed  by  Akaike  (1974) 

indicates  that  r(x)  should  be  chosen  as  IHF  if  c  >^1.00  instead  of  zero.  A  Bayesian 
criterion  proposed  by  Schwartz  (1978)  indicates  that  r(x)  should  be  chosen  as  IHF  if  c_>  .5 
log  N.  The  proportion  of  replications  in  which  these  criteria  resulted  in  the  correct 
classification  of  r(x)  was  quite  high  for  Model  B  using  each  of  these  criteria.  However, 
both  criteria  quite  frequently  misclassified  r^Xx)  as  IDHR.  Even  the  high  proportion  of 

correct  classifications  obtained  using  Akaike's  criterion  with  150  observations  was  not 
sustained  with  a  sample  size  of  500  (see  Additional  Analyses).  Thus,  it  appears  that  these 
two  criteria  are  not  necessarily  improvements  over  the  zero  criterion  for  c  in  the  present 
context,  although  they  have  been  found  to  be  quite  useful  in  contexts  where  inequality 
constraints  are  not  used  (e.g.,  Neftci,  1982). 

In  addition  to  examining  the  proportion  of  r(x)  that  were  correctly  classified,  the 
precision  of  each  r(x)  was  measured  by  the  integrated  square  discrepancy  (ISD), 

ISD  =  (sy)’1  /Qy  {  r(x)  -  r(x}2  dx 

where  y  is  the  95th  percentile  on  the  true  F(x).  Although  both  r(x)  and  r(x)  are  defined 
over  |  0,«>},  that  range  of  integration  is  not  used  here  because  the  linear  tails  of  those 
functions  would  result  in  ISD  -*■  °°  as  y  -*■  <*>  whenever  r(x)  £  r(x)  in  the  upper  tail.  The  scale 
factor, 

s  =  y'1  /0y  {  r(x)2}  dx, 

was  chosen  to  adjust  ISD  for  differences  in  the  magnitude  of  r(x)  across  models. 

Table  3  gives  the  mean  ISD  for  the  20  replications  under  each  model  and  sample  size. 
Not  surprisingly,  the  larger  sample  size  resulted  in  a  more  accurate  estimate  (i.e.,  a 
smaller  mean  ISD)  under  each  model,  although  it  is  not  much  more  accurate  in  the  case  of 
the  IHF  models,  C  and  D.  Also,  accuracy  was  greater  when  r(x)  was  lower  in  the  upper 
tail:  Model  A  has  the  lowest  tail;  Model  D  has  the  highest  tail. 


To  further  examine  the  precision  of  the  hazard  estimates,  the  mean  r(x)  for  each 
model  and  sample  size  was  plotted  as  a  function  of  x.  Figure  4  shows  these  results  as 
broken  lines,  plotted  along  with  the  corresponding  true  r(x)  as  solid  lines  up  to  the  95th 
percentile  of  x.  Included  on  each  graph  are  vertical  lines  connecting  the  3rd  and  18th 
order  statistics  (14th  and  86th  preentiles)  of  the  20  r(x);  these  are  given  at  the  5th,  15th, 
25th, — ,  95th  percentiles  on  x.  These  results  suggest,  not  surprisingly,  that  positive  bias 
and  sampling  variability  are  increasing  functions  of  r(x)  and  decreasing  functions  of  N. 
The  results  also  show  a  pronounced  positive  bias  occurring  in  the  upper  tail  of  Models  B, 
C,  and  D,  consistent  with  the  finding  that  these  models  have  the  highest  mean  ISD.  This 
bias  is  also  consistent  with  the  frequency  of  gj  >  gj^  for  these  models.  Finally,  there  is  a 
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Table  3 


Mean  Integrated  Square  Discrepancy  (ISD)  of  the 
Hazard  Estimate  by  Simulation  Model 


Mean  ISD 


Sample 

Model 

Model 

Model 

Model 

Size 

A 

B 

C 

D 

75 

.085 

.1 77 

.199 

.267 

150 

.053 

.081 

.120 

.199 

Note.  N  =  20  replications  for  each  model  under  each  sample  size. 


visually  small  but  reliable  negative  bias  in  the  lower  tail  for  each  model  and  sample  size. 
This  result,  considered  along  with  the  positive  bias  in  the  upper  tail,  suggests  that  the 
sample  quantiles  may  be  less  dispersed  on  x  than  are  the  corresponding  population 
quantiles  for  the  rather  broad  class  of  distributions  sampled  here. 

An  important  question  in  this  analysis  is  whether  the  proposed  r(x)  is  sufficiently 
precise  to  provide  useful  estimates,  given  the  sample  sizes  employed  here.  The  results  in 
Figure  4  do  not  make  the  procedure  look  generally  promising,  given  a  sample  size  of  75. 
However,  fairly  accurate  detection  of  a  decreasing  r(x)  in  the  upper  tail  was  obtained  for 
one  model,  A,  with  this  sample  size.  That  model  has  an  r(x)  that  resembles  many  large- 
sample  empirical  estimates  obtained  by  Burbeck  and  Luce  (1982)  and  that  decreases  as 
much  as  the  hazard  function  of  some  parameterizations  of  the  often-proposed  Wald,  or 
inverse  Gaussian,  distribution.  Thus,  with  a  sample  size  as  small  as  75,  the  proposed 
procedure  may  be  useful  for  detecting  a  substantially  decreasing  r(x)  even  if  it  is  not 
useful  for  precise  estimation  of  that  function. 

Given  a  sample  size  of  150,  Figure  4  suggests  that  the  proposed  r(x)  can  provide  fairly 
unbiased  estimation  up  through  the  eighth  decile  when  r(x)  is  IHF.  However,  when  r(x)  is 
IDHF,  the  estimates  tend  to  be  incorrectly  classified  as  IHF  if  r(x)  decreases  only  slightly 
(Model  B),  and  are  markedly  biased  and  variable  at  the  peak  of^r(x)  if  r(x)  decreases  more 
substantially  (Model  A).  When  r(x)  decreases  substantially,  r(x)  provides  a  fairly  good 
estimate  of  where  the  peak  of  r(x)  occurs  on  x,  but  it  does  not  provide  a  reliable  estimate 
of  the  height  of  that  peak. 

Additional  Analyses 

Although  the  preceding  analyses  provide  useful  results,  three  additional  analyses  were 
conducted  to  probe  the  effects  of  (a)  using  a  substantially  larger  sample  size,  (b) 
transforming  the  data  to  increase  the  dispersion  on  x,  and  (c)  constraining  r(x)  to  have  no 
more  than  one  inflection  (as  in  the  empirical  example). 

To  probe  the  effect  of  using  a  larger  sample  size,  20sreplications  with  N  =  500  were 
obtained  for  Models  B  and  C.  The  sets^of  constraints  on  r(x)  were  the  same  as  those  used 
in  the  simulation  study.  For  Model  B,  r(x)  was  correctly  classified  as  IDHF  in  90  percent 
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N  =  75 


N  =  150 


Model 

A 


Model 

B 


_  Hazard  function. 

-  Mean  hazard  estimate. 

Variation  of  hazard  estimate. 

Figure  4.  Hazard  function,  mean  hazard  estimate,  and  variation  of 
hazard  estimate  for  four  simulation  models  (N  =  75,  150). 
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Model 

C 


Hazard  function. 

Mean  hazard  estimate. 
Variation  of  hazard  estimate. 


Figure  4.  Continued. 


of  the  replications;  for  Model  C,  r(x)  was  correctly  classified  as  IHF  in  95  percent  of  the 
replications.  The  use  of  Akaike's  (1974)  and  Schwartz's  (1978)  criteria  for  c  =  gj  - 

resulted  in  all  r(x)  being  incorrectly  classified  for  Model  C.  These  results  lend  further 
support  to  the  use  of  the  zero  criterion  for  c  when  choosing  between  IHF  and  IDHF 
constraints. 


Both  the  global  and  pointwise  measures  of  precision  indicate  that  r(x)  was  much 
improved  when  the  substantially  larger  sample  size  was  used.  The  mean  ISD  indices  were 
.009  and  .005  for  Models  B  and  C  respectively,  compared  with^  values  of  .081  and  .120 
obtained  with  samples  of  150.  Figure  5  shows  r(x),  the  mean  r(x)  and  the  3rd  and  18th 
order  statistics  of  r(x)  for  the  5th,  15thx — ,  95th  percentiles  on  x.  The  larger  sample  size 
substantially  improved  the  precision  of  r(x)  in  the  upper  tail,  resulting  in  the  smaller  mean 
ISD  as  well  as  the  better  overall  visual  proximity  to  r(x).  However,  using  the  larger 
sample  size  did  not  completely  remove  what  appears  to  be  a  positive  bias  in  r(x)  in  the 
middle  of  the  distribution  and  a  negative  bias  between  the  middle  and  each  of  the  tails. 
Further  research  is  needed  to  ascertain  whether  the  bias  is  due  to  the  nature  of  the 
constraints  or  due  to  the  bias  that  is  nearly  always  present  in  the  order  statistics  (Johnson 
&  Kotz,  1970). 

To  probe  the  effect  of  increasing  the  dispersion  on  x,  the  40  replications  with  N  =  75 
from  Models  B  and  C  were  transformed  by  replacing  each  order  statistic,  T^,  with  a  point 

halfway  between  the  adjacent  order  statistics,  T^  ^  and  T^+jy  This  transformation 

tends  to  shift  each  data  point  in  the  direction  of  lower  local  density.  In  a  unimodal 
distribution,  it  tends  to  shift  data  toward  the  tails.  In  estimating  r(x)  from  the 
transformed  data,  the  sets  of  constraints  were  the  same  as  those  used  in  the  simulation 
study  reported  above.  Also,  for  each  replication,  the  zero  criterion  for  c  was  used  in 
choosing  between  the  IHF  and  IDHF  constraints. 


A 

All  measures  of  precision  indicated  that  r(x)  was  at  least  somewhat  improved  by 
transforming  the  data.  For  Model  B,  r(x)  was  correctly  classified  as  IHF  in  85  percent  of 
the  replications;  for  Model  C,  r(x)  was  correctly  classified  as  IDHF  in  90  percent  of  the 
replications.  The  first  of  these  percentages  is  an  improvement  over  the  60  percent 
accuracy  obtained  with  the  untransformed  data  (see  Table  1).  The  mean  ISD  indices  were 
.096  and  .168  for  Models  B  and  C  respectively;  each  of  these  is  better  than  the 
corresponding  indices,  .177  and  H99,  obtained  with  the  untransformed  data.  An  inspection 
of  the  pointwise  distribution  of  r(x)  indicated  that,  for  both  models,  transforming  the  data 
reduced  both  the  bias  and  variability  of  the  estimator  in  the  upper  tail  of  x  but  had  little 
effect  on  precision  elsewhere.  Even  so,  the  improvement  in  the  upper  tail  is  encouraging, 
because  that  is  where  r(x)  was  least  precise  before  the  transformation.  A  modification  of 
the  transformation,  say,  by  differentially  weighting  T  ...  and  T  ,■  may  improve  the 

precision  still  further.  '1_  '  '1+ 


To  assess  the  effect  of  constraining  r(x)  to  have  no  more  than  one  inflection,  the  20 
samples  of  150  from  Model  C  were  used  to  obtain  r(x)  under  limited-inflection  constraints, 
as  in  the  second  part  of  the  empirical  example  described  previously.  The  mean  ISD  for 
these  estimates  was  .060,  compared  with  the  mean  ISD  of  .120  obtained  with  the  same 
data  when  r(x)  was  constrained  only  ^to  be  isotonic  increasing.  Figure  6  shows  r(x),  the 
mean  r(x)  and  the  order  statistics  of  r(x)  in  the  manner  depicted  in  Figures  4  and  5.  When 
Figure  6  is  compared  with  the  results  in  Figure  4  for  Model  C  with  N  =  150,  it  is  clear  that 
constraining  r(x)  to  have  no  more  than  one  inflection  can  increase  the  precision  in  the 
upper  tail.  However,  this  increase  in  precision  is  at  the  expense  of  additional  positive  bias 
in  r(x)  between  the  seventh  and  eighth  deciles.  Thus,  although  a  less  inflected  r(x)  has  a 
more  plausible  visual  smoothness,  it  is  not  necessarily  more  precise  pointwise  than  an 
estimate  constrained  only  to  be  isotonic. 
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Figure  5.  Hazard  function,  mean  hazard  estimate,  and  variation  of 
hazard  estimate  for  simulation  Models  B  and  C  (N  =  500). 
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Figure  6.  Hazard  function,  mean  hazard  estimate,  and  variation  of 

hazard  estimate  for  simulation  Model  C  (N  =  150)  with 
limited-inflection  constraints. 


CONCLUSIONS 

When  the  hazard  function  was  constrained  to  be  isotonic  or  bidirectional  and  to  have 
a  small  number  of  inflections,  the  empirical  example  demonstrated  that  a  maximum 
penalized  likelihood  procedure  can  provide  rapid  computation  of  this  kind  of  hazard 
estimator  and  can  produce  a  good  fit  to  the  data.  The  shape  of  the  hazard  estimate  in  the 
empirical  example  resembled  the  shape  of  hazard  estimates  frequently  obtained  under  two 
models  in  the  simulation  study  when  moderate  sample  sizes  (75  and  150)  were  used.  In  one 
of  those  models,  the  true  hazard  function  increased  and  then  leveled  off  at  the  75th 
percentile;  in  the  other  model,  the  true  hazard  function  increased  and  then  decreased  to 
about  three  fourths  of  its  maximum  before  leveling  off.  Thus,  in  light  of  the  simulation 
results,  it  is  difficult  to  decide  whether  the  empirical  example's  true  hazard  function  is  an 
increasing  or  an  increasing-then-slightly-decreasing  function. 

The  simulation  study  indicated  that  more  substantial  decreases  (by  half  or  more)  in 
the  tail  of  the  true  hazard  function  could  be  detected  reliably  by  the  proposed  method 
when  moderate  sample  sizes  were  used.  The  method,  therefore,  has  practicality,  because 
such  decreases  have  been  found  in  empirical  studies  as  well  as  in  analyses  of  parametric 
distributions  (e.g.,  the  inverse  Gaussian  distribution).  The  study  also  indicated  that  the 
proposed  method  could  reliably  detect  smaller  decreases  in  the  true  hazard  function  and 
could  provide  rather  precise  pointwise  estimates  of  the  tail  of  that  function  when  a  large 
sample  size  (N  =  500)  was  used.  The  large  sample  size  also  resulted  in  much  more 
accurate  estimation  of  the  hazard  function  across  the  middle  range  of  the  distribution, 
but  bias  was  still  perceptible.  Finally,  the  study  indicated  that  transforming  the  data  to 
increase  the  dispersion  on  x  may  be  an  effective  means  of  improving  the  method's 
precision  with  moderate  sample  sizes.  Which  transformation  to  use  with  which  set  of 
constraints  is  a  subject  for  further  study. 
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